A dependency measure based on Jensen-Shannon
They define a dependency measure between two random variables, which is based on the Jensen-Shannon divergence.
The Kullback-Leibler (KL) divergence of a distribution $p$ from a distribution $q$ is defined as follows:
$KL(p||q)=\sum_{i \in A} p_i log \frac{p_i}{q_i}$
The mutual information between two jointly distributed random variables $X$ and $Y$ is defined as the KL divergence of the joint distribution $p(x,y)$ from the product $p(x)p(y)$ of the marginal distributions of $X$ and $Y$. This means:
$I(X;Y)=KL(p(x,y)||p(x)p(y))$
Similar to the KL-divergence and PMI.
$$JS_\alpha (p,q)= \alpha KL(p||q) + (1-\alpha) KL(q||r) \\ =H(r)-\alpha H(p) - (1-\alpha) H(q)$$
where $H(p)$ is the entropy function(i.e. $H(p)=-\sum_i p_i log p_i$).
They define the Jensen-Shannon Mutual Information (JSMI) as follows:
$JSMI_\alpha (X,Y) = JS_\alpha (p(x,y),p(x)p(y))$
As can be seen, the embeddings score gets close to the optimal value using higher dimen- sionality and more training iterations, but doesn’t surpass it. They showed that the optimization of skip-gram embeddings with negative sampling finds the best low-dimensional approximation of the JSMI measure.